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Datatype-generic programming increases program abstraction and reuse by making functions operate 
uniformly across different types. Many approaches to generic programming have been proposed over 
the years, most of them for Haskell, but recently also for dependently typed languages such as Agda. 
Different approaches vary in expressiveness, ease of use, and implementation techniques. 

Some work has been done in comparing the different approaches informally. However, to our 
knowledge there have been no attempts to formally prove relations between different approaches. 
We thus present a formal comparison of generic programming libraries. We show how to formalise 
different approaches in Agda, including a coinductive representation, and then establish theorems 
that relate the approaches to each other. We provide constructive proofs of inclusion of one approach 
in another that can be used to convert between approaches, helping to reduce code duplication across 
different libraries. Our formalisation also helps in providing a clear picture of the potential of each 
approach, especially in relating different generic views and their expressiveness. 



1 Introduction 

There are many forms of genericity [8]. Of particular interest to us is datatype-genericity: behavior that 
is generic over the structure of types. Functional programming languages typically support the definition 
of algebraic datatypes. Due to their algebraic nature, these datatypes can be conveniently encoded in 
a sum-of-products structure. Many functions can be defined to operate on this structure alone; typical 
examples are (de)serialisation, traversals, equality, and enumeration. 

Datatype-generic programming (from here on referred to simply as generic programming) approaches 



have been especially prolific in Haskell (24J. PolyP p4| , now 15 years old, was the first approach 
to generic programming in Haskell, implemented as a pre-processor. Since then, and especially with 
the advent of advanced type features such as generalized algebraic datatypes (GADTs) and type fam- 
ilies (28j, numerous other approaches have appeared, most of them implemented directly as a library. 
This abundance is caused by the lack of a clearly superior approach; each approach has its strengths and 
weaknesses, uses different implementation mechanisms, a different generic view |T3| (i.e. a different 
structural representation of datatypes), or specialises on a particular task. Their sheer number and variety 
makes comparisons difficult, and can make prospective generic programming users struggle even before 
actually writing a generic program, since first they have to choose a library that is adequate to their needs. 

Some effort has been made in comparing different approaches to generic programming from a practi- 
cal point of view [ 10 , 26 ] , or to classify approaches [11]. While Generic Haskell fT5| has been formalised 
in different attempts (27j|29j, no formal comparison between modern approaches has been attempted, 
leaving a gap in the knowledge of the relationships between each approach. We argue that this gap 
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should be filled; for starters, a formal comparison provides a theoretical foundation for understanding 
different generic programming approaches and their differences and similarities. However, the contri- 
bution is not merely of theoretical interest, since a comparison can also provide means for converting 
between approaches. Ironically, code duplication across generic programming libraries is evident: the 
same function can be nearly identical in different approaches, yet impossible to reuse, due to the underly- 
ing differences in representation. With a formal proof of inclusion between two approaches a conversion 
function can be derived, removing the duplication of generic code. 

In this paper we take the initial steps towards a formal comparison of generic programming ap- 
proaches: 

• We encode five distinct, yet related generic programming libraries in the dependently-typed pro- 
gramming language Agda [23]. 

• We encode one specific Haskell library using recursive codes, by means of a coinductive formula- 
tion. We pay special attention to this approach, as it also gives rise to more challenging proofs. 

• We show the relations between the approaches, and reason about them. While the inclusion rela- 
tions are the expected, the way to convert between approaches is often far from straightforward, 
and reveals subtle differences between the approaches. Each inclusion is evidenced by a conver- 
sion function that brings codes from one universe into another, enabling generic function reuse 
across different approaches. 

• Although fully machine-checked (modulo non-termination), our proofs are in equational reasoning 
style and resemble handwritten proofs, remaining clear and elegant. 

The rest of this paper proceeds as follows: we first introduce the five approaches we compare by 
showing their encoding in Agda in Section 2 We show the inclusion relations between these approaches 
Section 3 and focus on the details of some particular proofs. Finally we conclude in Section 4j 



in 



discussing shortcomings of our work and providing directions for future research. 



2 Generic programming libraries 

In this section we introduce each of the five libraries that we model. We model all the libraries in Agda, 
and reduce them to their essence, namely to the generic view they encode. We leave out details such as 
implementation mechanisms (e.g. through type classes or GADTs), encoding of meta-information such 
as constructor names (it is generally an issue orthogonal to the library), or modularity. 

All five libraries we choose use a sum-of-products representation of data. One does not use fixed 
points, while the other four use each a different form of fixed-point operator. For the latter we show how 



to define a mapping function, which can be used to define all the standard recursion morphisms [ 19 1, and 
are also necessary for the conversion proofs. We leave the comparison of libraries using a view other than 
sum-of-products for future work ( |Section 4| ). Nonetheless, the libraries we choose are rather different 
in their expressiveness, especially regarding support for parametrised datatypes and families of mutually 
recursive types, as we will show. 

2.1 Regular 

Regular is a simple generic programming library, originally written to support a generic rewriting sys- 



tem |22|. It has a fixed-point view on data: the generic representation is a pattern-functor, and a fixed- 
point operator ties the recursion explicitly. In the original formulation, this is used to ensure that rewriting 
meta-variables can only occur at recursive positions of the datatype. 
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We model every library by defining a Code type that represents the generic universe, and its inter- 
pretation function [_] that maps the codes into Agda types. The universe, interpretation, and fixed-point 
operator for Regular follow: 



data Code : Set where 

U : Code 
I : Code 
_0_ : (FG : Code) -)• Code 
_&>_ : (FG : Code) -> Code 



[_] : Code (Set -> Set) 

[U |A = T 

[I ]A = A 

[ F © G ] A = [F]AW [G] A 

[ F ® G J A = [F]Ax[G]A 



data 11 (F : Code) : Set where (_) : [ Fj {11 F) ->• 11 F 

We have codes for units, sums, products, and recursive positions, denoted by I. The interpretation of unit, 
sum, and product relies on the Agda types for unit (T), disjoint sum (J±L)> and non-dependent product 
(_x_), respectively. The interpretation is parametrised over a Set that is returned in the I case. 

Libraries with a fixed-point view on data allow defining a map function. In Regular, this function 
lifts a transformation between sets A and B to a transformation between interpretations parametrised over 
A and B, simply by applying the function in the I case: 

map : (F : Code) -> { A B : Set} ->-(A->-B)->-[[F]]A->-[[F]B 

map U f _ = tt 

map I fx = f x 

map (F © G) f (inj n x) = inj-i (map F f x) 

map (F © G) f (inj 2 x) = inj 2 (map G f x) 

map (F © G) f (x , y) = map F f x , map G f y 

The map function can be used to define many other useful generic functions, most notably recursion 
morphisms (T9J. For example, catamorphisms are defined as follows: 

cata : { A : Set} (F : Code) ([ F ] A A) -> ii F -> A 
cata C f ( x ) = f (map C (cata C f) x) 

Datatypes can be encoded by giving their code, such as NatC for the natural numbers, and then taking 
the fixed point. Hence, a natural number is a value of type 11 NatC; in the example below, aNat encodes 
the number 2: 



NatC : Code 
NatC = U © I 

aNat : ju NatC 

aNat = ( inj 2 ( inj 2 ( inji tt ) ) ) 



2.2 PolyP 

PolyP |l4j is an early pre-processor approach to generic programming. However, this aspect is not es- 
sential to PolyP, as it still works with an underlying view of Haskell datatypes. This view is very similar 
to that of Regular, only that it also abstracts over one datatype parameter, in addition to one recursive 
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occurrence. PolyP therefore represents types as bifunctors, whereas Regular uses plain functors. The 
encoding of PolyP's universe in Agda follows: 



data Code 
U 
P 
I 



Set where 



_®_ 



^FG : Code) 
^FG : Code) 
fFG : Code) 



Code 
Code 
Code 
Code 
Code 
Code 



[_] : Code -> (Set -> Set -> Set) 
[U ]AR = T 
[P ]AR = A 
[I 1 AR = R 

[F©G] AR = [F]ARW [G]AR 
[ F ® G J A R = [F]ARx [G]AR 
[F@G] AR = /i F ([G ] A R) 



data n (F : Code) (A : Set) : Set where (_) : [ F ] A (ju F A) -> ii F A 

In the codes, the only differences from Regular are the addition of a P code, for the parameter, and 
a code _©_ for composition. The interpretation is parametrised over two Sets, one for the parameter 
and the other for the recursive position. Composition is interpreted by taking the fixed-point of the left 
bifunctor, thereby closing its recursion, and replacing its parameters by the interpretation of the right 
bifunctor. There is at least one other plausible interpretation for composition, namely interpreting the 
left bifunctor with closed right bifunctors as parameter ([ F ] (juG A) R), but we give the interpretation 
taken by the original implementation. 

This asymmetric treatment of the parameters to composition is worth a detailed discussion. In PolyP, 
the left functor F is first closed under recursion, and its parameter is set to be the interpretation of the 
right functor G. The parameter A is used in the interpretation of G, as is the recursive position R. Care 
must be taken when using composition to keep in mind the way it is interpreted. For instance, if we have 
a code for binary trees with elements at the leaves TreeC, and a code for lists ListC, one might naively 
think that the code for trees with lists at the leaves is TreeC © ListC, but that is not the case. Instead, 
the code we are after is (ListC © P) © (I © I). Apart from requiring careful usage, this composition 
does not allow us to reuse the code for trees when defining trees with lists (although the resulting code 



quite resembles that of trees). The Indexed approach (described in Section 2.4) has a more convenient 



interpretation of composition; this subtle difference is revealed explicitly in our conversion from PolyP 



to Indexed (Section 3.2 1 



The fixed-point operator takes a bifunctor and produces a functor, by closing the recursive positions 
and leaving the parameter open. The map operation for bifunctors takes two argument functions, one to 
apply to parameters, and the other to apply to recursive positions: 



map : { A B R S : Set} (F : Code 
map U f g _ = tt 
fgx 
fgx 

G) f g ( i n j - x .., . . 

nid|j yr w G) f g (inj 2 x) = inj 2 (map G f g x) 
map (F © G) f g (x , y) = map F f g x , map G f g y 
map (F © G) f g ( x ) = ( map F (map G f g) (map (F 



map U 
map P 
map I 
map (F 
map (F 
map (F 



(A^ B )^(R^S)^[F]AR^[F]BS 



= tt 

= f x 
= gx 

= inji (map F f g x) 



y 

iap (F@G)fg)x; 



A map over the parameters, pmap, operating on fixed points of bifunctors, can be built from map 
trivially: 
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pmap : { A B : Set} (F : Code) ^(A^B)^^FA^^FB 
pmap F f ( x ) = ( map F f (pmap F f) x ) 

As an example encoding in PolyP we show the type of non-empty rose trees: 

ListC : Code sRose : ti RoseC T 

ListC = U (P ® I) sRose = ( tt, ( inji tt ) ) 

RoseC : Code I Rose : ti RoseC T 

RoseC = P ® (ListC ® I) IRose = ( tt, ( inj 2 (sRose, ( inj 2 (sRose, ( in^ tt )))))) 

We first encode lists in ListC; rose trees are a parameter and a list containing more rose trees (RoseC). 
The smallest possible rose tree is sRose, containing a single element and an empty list. A larger tree 
IRose contains a parameter and a list with two small rose trees. 



2.3 Multirec 



The MultiRec library [25] is also a generalisation of Regular, allowing for multiple recursive posi- 
tions instead of only one. This means that families of mutually recursive datatypes can be encoded in 
MultiRec. For this, types are represented as higher-order (or indexed) functors: 

Indexed : Set — > Se^ 
Indexed I = I — > Set 



Codes themselves are parametrised over an index Set, that is used in the I case. Furthermore, we have 
a new code ! for tagging a code with a particular index. The interpretation is parametrised by a function r 
that maps indices (recursive positions) to Sets, and a specific index i that defines which particular position 
we are interested in; since a code can define several types, the interpretation is a function from the index 
of a particular type to its interpretation. For an occurrence of an index I, we retrieve the associated set 
using the function r. Tagging constrains the interpretation to a particular index j, so we check if j is the 
same as i: 



data Code (I : Set) 
U 



I 



^FG : Code I 
fFG : Code I 



Set where 

Code I 
-> Code I 
-> Code I 
Code I 
Code I 



l_j : { I : Set} -)■ Code I -)• Indexed I -)• Indexed I 

[U ]ri = T 

[Ij ]ri = rj 

[!j ]ri = i=j 

[F®G]ri = [F]riW [G]ri 

[F^Gflri = [F]rix [G]ri 



Mapping is entirely similar to the Regular map, only that the function being mapped is now an 
index-preserving map. Similarly, the fixed-point operator is also indexed: 



data ii { I : Set} (F : Code I) (i : I) : Set where (_) : [ F ] (/i F) i — >• /i F i 



map : {I : Set} {RS : Indexed 1} (F : Code I) 

^(Vi^Ri^Si)^(Vi^[F]Ri^[F]Si) 
map U f i _ = tt 
map (I j) fix = f j x 
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map (! j) fix = x 

map (F © G) f i (inji x) = inji (map F f i x) 

map (F © G) f i (inj 2 x) = inj 2 (map G f i x) 

map (F © G) f i (x , y) = map F f i x , map G f i y 

To show an example involving mutually recursive types we encode a zig-zag sequence of even length. 
Consider first the family we wish to encode, inside a mutual block as the datatypes are mutually recur- 
sive: 

mutual 
data Zig : Set where 

zig : Zag -> Zig 
end : Zig 

data Zag : Set where 

zag : Zig -> Zag 

We can encode this family in MultiRec as follows: 

ZigC : Code (T l±) T) 
ZigC = I (inj 2 tt) © U 

ZagC : Code (T l±) T) 
ZagC = I (ir^ tt) 

ZigZagC : Code (T tt) T) 

ZigZagC = (! (inji tt) © ZigC) © (! (inj 2 tt) © ZagC) 
zigZagEnd : /i ZigZagC (inj 1 tt) 

zigZagEnd = ( inji (refl, inj-, ( inj 2 (refl, < inj 1 (refl, inj 2 tt) )) )) ) 

zigZagEnd encodes the value zig (zag end), as its name suggests. Note how we define the code for each 
type in the family separately (ZigC and ZagC), and then a code ZigZagC for the family, encoding a tagged 
choice between the two types. As a consequence, proofs of index equality (witnessed by refl) are present 
throughout the encoded values. 



2.4 Indexed Functors 

Just like MultiRec can be seen as a generalisation of Regular to multiple recursive positions, the 
Indexed approach [16] can be seen as a generalisation of PolyP both to multiple recursive positions 
and multiple parameters. In Indexed, datatypes are represented by functors which are not only indexed 
on their input (like MultiRec) but also on their output. It is related to other approaches to dependently- 
typed generic programming, such as the levitating universe of Epigram 2 pj, or the theory of indexed 
containers (2j. 

The added complexity makes Indexed cumbersome to encode in Haskell, so its original description 



was in Agda (although recent developments in GHC's kind system [30] might now allow us to write a 
Haskell version of this library). Below we show a subset of its universe; we elide the reindexing, sigma, 
and isomorphism operators from the original presentation: 



56 



A Formal Comparison of Approaches to Datatype-Generic Programming 



Indexed : Set — > SeU 
Indexed I = I Set 



_|_ : V {IJ} ->■ Indexed I ->■ Indexed J 

Indexed (I W J) 
(r | s) (inj 1 i) = r i 
(r | s) (inj 2 i) = s i 



data Code (I O : Set) : Set! where 
U : Code I O 



[_] : {IO : Set} Code I O 



— > Indexed I — > Indexed O 



(G : Code I M) Code I O 
Fix : (F : Code (I WO)0)-> Code I O 



: O Code I O 

.0_ : (FG : Code I O) ^ Code I O 

.©_ : (FG : Code I O) ^ Code I O 

©_ : {M : Set} (F : Code M O) 



Code I O 



[U ]ri = T 
[Ij ]ri = rj 
[!j ]ri = i = j 
[FffiGflri = [F]riW [G]ri 
[F©Gflri = [F]rix [G]ri 
[F@Gflri = [F]([G]r)i 
[ Fix F j r i = n F r i 



data ju { I O : Set} (F : Code (I tt) O) O) (r : Indexed I) (o : O) : Set where 
(_) : [F](r | AiFr)o^AiFro 

A major difference from the previous approaches is that the fixed-point operator is contained within 
the universe. Composition is also allowed, and, unlike in PolyP, it does not require taking a fixed 
point. Codes are parametrised over two Sets, which can be thought of as the input set (parameters) 
and output set (types defined). Composition is only possible for codes with composable indices; for 
instance, a family of three types with two parameters can be composed with a family of two types with 
one parameter, resulting in the original family of three types, but now taking only one parameter. The 
fixed-point operator takes a code with tagged input indices (parameters on the left, recursive occurrences 
on the right) and closes the recursive occurrences, producing a code with only parameters as input. 

The map function lifts a transformation r 14 s between indexed functors r and s to a transformation 
[Fjr 14 [Fjs between interpretations. In the Fix case, care has to be taken to ensure that the mapping 
function f is applied to the parameters on the left, and map to the recursive positions on the right: 



_=l_ : V{I}^(RS : Indexed I) ^ Set 
r s = (i : _) — > r i — > s i 



_||_ : V{IJ}{AC : lndexedl}{BD : lndexedJ}^A z4 C^B z4 D- 


» (A | B) =t (C | D 


(f || g) (inj! x) z = f xz 




(f || g) (inj 2 x)z = gxz 




map : {I : Set} {rs : Indexed 1} (F : Code 1 0) ->• (r z4 s) ->• [ Fj r — 


[F]s 


map U f i _ = tt 




map (1 j) fix = f j x 




map (! j) fix = x 




map (F © G) f i (inj-i x) = in'^ (map F f i x) 




map (F © G) f i (inj 2 x) = inj 2 (map G f i x) 




map (F © G) f i (x , y) = map F f i x , map G f i y 




map (F © G) f i x = map F (map G f) i x 




map (Fix F) fi(x) = ( map F (f map (Fix F) f) i x ) 
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For more information and examples of how to encode datatypes in Indexed, we refer the reader to the 



paper that introduced this approach [ 16 1. 



2.5 Instant Generics 

InstantGenerics |4j is another approach to generic programming in Haskell with type families. It 
distinguishes itself from all the other approaches we have discussed so far in that it does not represent 
recursion via a fixed-point combinator. Like Regular, InstantGenerics also supports a generic rewrit- 
ing library (2TJ . To allow meta- variables to occur at any position (i.e. not only in recursive positions), 
type-safe runtime casts are performed to determine if the type of the meta-variable matches that of the 
expression. InstantGenerics is also rather similar to the generic programming support recently built 
into the Glasgow and Utrecht Haskell compilers p7| . 

In the original encoding of InstantGenerics in Haskell, recursive datatypes are handled through 
indirect recursion between the conversion functions (between a datatype and its generic representation) 
and the generic functions. We find that the most natural way to model this in Agda is to use conduc- 
tion j7). This allows us to define infinite codes, and generic functions operating on these codes, while still 
passing the termination check. This encoding would also be appropriate for other Haskell approaches 
without a fixed-point operator, such as "Generics for the Masses" [9] and LIGD (6j. Although approaches 
without a fixed-point operator have trouble expressing recursive morphisms, they have been popular in 
Haskell because they easily allow encoding datatypes with irregular forms of recursion (such as mutually 
recursive or nested [3] datatypes). 

Compared to the previous approaches, the novelties in the universe of InstantGenerics are a code 
K for arbitrary Sets, and a code R for tagging recursive codes. We give the interpretation as a datatype 
to ensure that it is inductive^ The judicious use of the coinduction primitive b in the R case makes the 
Agda encoding pass the termination checker, as the definitions remain productive: 



data Code : SeU where 



u 






Code 


K 


Set 




->■ Code 


R 


(C: 


oo Code) - 


->■ Code 


_e_ 


(CD 


: Code) - 


->■ Code 




(CD 


: Code) - 


^Code 



data [_] : Code — > Seti where 



tt 








[u 




k 


{A : Set} 


-> A 




"> [ K A 




rec 


{C : °o Code} - 


+ [bC] 








inji 


{CD : Code} - 


+ [CJ 




+ [ce 


D 


inj 2 


{CD : Code} - 


+ [D] 




+ [C© 


D 




{CD : Code} - 


+ [C]H 


•[D]- 


-> [C<g> 


D 



Note the encoding of lists in InstantGenerics: 



ListC : Set -)■ Code 
ListC A = U (K A . 



R (|j ListC A)) 



The definition for ListC is directly recursive; since it remains productive, it is accepted by the termination 
checker. Due to the lack of fixed points, we cannot write a map function. But we can easily write other 
generic functions, also recursive, such as a traversal that crushes a term into a result: 



R^R)^(R^R)^R^[AJ^R 



crush : {R : Set} (A : Code) ->■ (R 
crush U _ffl_frl_ = 1 

'Alternatively, we could use the experimental Agda flag --guardedness-preserving-type-constructors to treat type 
constructors as inductive constructors when checking productivity. 
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crush (K y) 


J 






= 1 


crush (RC; 


E 




(rec x) 


= fr (crush (bC)_EB_frlx) 


crush (Cffi 


D)J 


B_lil 


(inji x) 


= crush C _EB_ ft 1 x 


crush (Cffi 


D)J 




(inj 2 x) 


= crush D_EB_fr 1 x 


crush (C <g) 


D)J 




(x,y) 


= (crush C_EB_fr 1 x) EB (crush D 



Function crush is similar to map in the sense that it can be used to define many generic functions. It 
takes three arguments that specify how to combine the results of each constructor argument (_E3_), how 
to adapt the result of a recursive call (fr), and what to return for constants and constructors with no argu- 
ments (1)0 However, crush is unable to change the type of datatype parameters, since InstantGenerics 
has no knowledge of parameters. 

We can compute the size of a structure as a crush, for instance: 

size : (A : Code) [ A j -> N 
sizeC = crush C _+_ sue 

Here we combine multiple results by adding them, increment the total at every recursive call, and ignore 
constants and units for size purposes. We can test that this function behaves as expected on lists: 

aList : [ ListC T j 

aList = inj 2 (ktt , rec (inj 2 (ktt , rec (inji tt)))) 

testSize : size _ aList = 2 
testSize = refl 

While a map function cannot be defined like in the previous approaches, traversal and transformation 
functions can still be expressed in InstantGenerics. In particular, if one is willing to exchange static 
by dynamic type checking, type-safe runtime casts can be performed to compare the types of elements 
being mapped against the type expected by the mapping function, resulting in convenient to use generic 
functions [21]. However, runtime casting is known to result in poor runtime performance, as it prevents 
the compiler from performing type-directed optimisations fT8j . 



2.6 Summary 

We have shown an Agda encoding for the five libraries we compare. In order to simplify the proofs for 
the remainder of the paper, we have omitted a few details: 

• In their original formulation, all libraries supported embedding Sets into the universe (like the K 

code in InstantGenerics). We omit these for simplicity, except in InstantGenerics where they 

are essential for embedding datatype parameters. 



We paid no attention to ease of use of the encodings. Adding isomorphisms to the universe [16], 
for instance, would make the encodings easier to use in practice. However, for our purposes of 
formal modelling, isomorphisms would serve only to enlarge the proofs, with no added benefit. 
Therefore we decided to omit them from the model. 



The variant of Indexed we consider here is significantly simpler than its original presentation [ 16 1. 



In particular, we omit the sigma code, which is used for encoding indexed types. If we were to con- 



sider this code then Indexed would no longer embed fully into InstantGenerics (Section 3.3 1, 
since the latter does not support indexed types. 



"It is worth noting that crush is itself a simplified catamorphism for the Code type. 
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3 Comparing the approaches 



We now proceed to describe how the approaches relate to each other. We show which approaches can 
be embedded in other approaches; when we say that approach A embeds into approach B, we mean that 
the interpretation of any code defined in approach A has an equivalent interpretation in approach B. The 
starting point of an embedding is a code-conversion function that maps codes from approach A into 



approach B. Figure 1 presents a graphical view of the embedding relation between the five approaches; 
the arrows mean "embeds into". Note that the embedding relation is naturally transitive. As expected, 
MultiRec and PolyP both subsume Regular, but they don't subsume each other, since one supports 
families of recursive types and the other supports one parameter. They are however both subsumed 
by Indexed. Finally, the liberal encoding of InstantGenerics allows encoding at least all the types 
supported by the other approaches (even if it doesn't support the same operations on those types, such as 
catamorphisms). 





PolyP 

h Pi © <8> ® 



Indexed 


> 


InstantGenerics 


l n P n © <8> © Fix 




R ffi © 



Figure 1 : Embedding relation between the approaches 



We will now focus on some of the conversions and their proofs, choosing those that are particularly 
interesting. 



3.1 Regular to PolyP 

We start with the simplest relation: embedding Regular into PolyP. The first step is to convert Regular 
codes into PolyP codes: 

rfrp : Code r -)■ Code p 
r p U r = U p 

I, = lp 

rfrP (F© r G) = ( r -fre F)© p (|PG) 

r ^p (F(g) r G) = ( r r F)0 p ( r ^G) 

Since all libraries share similar names, we use subscripts to denote to what library we are referring; r for 
Regular and p for PolyP. All Regular codes embed trivially into PolyP, so r ff p is unsurprising. After 
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having defined code conversion, we can show that the interpretation of a code in Regular is equivalent 
to the converted code in PolyP. We do this by defining an isomorphism between the two interpretations 
(and their fixed points). We show one direction of the conversion, from Regular to PolyP: 

from r : {R : Set} (C : Code r ) ->• [C] r R -»• [ r lt p C] p 1 R 

from r U r tt = tt 

from r l r x = x 

from r (Fffi r G) (inji x) = inji (from r F x) 

from r (F© r G) (inj 2 x) = inj 2 (from r G x) 

from r (F(g) r G) (x , y) = from r F x , from r G y 

fromjU r : (C : Code r ) ->■ ti r C ->■ ti p ( r ft p C) _L 
from/^ r C (x) r = (from r C (map r C (fromjU r C) x)) p 

Since Regular does not support parameters, we set the PolyP parameter to _L for Regular codes. Func- 
tion from r does the conversion of one layer, while from/i r ties the recursive knot by expanding fixed 
points, converting one layer, and mapping itself recursively. Unfortunately fromjU r (and indeed all con- 
versions in this paper involving fixed points) does not pass Agda's termination checker; we provide some 



insights on how to address this problem in Section 4 



The conversion in the other direction (to r and to/i r ) is entirely symmetrical. 

Having defined the conversion functions, we now have to prove that they indeed form an isomor- 
phism. We first consider the case without fixed points: 

isoi : { R : Set} -> (C : Code r ) -> (x : [C] r R) -> to r C (from r C x) = x 

isoi U r _ = refl 

isoi l r _ = refl 

isoi (F© r G) (inji x) = cong inji (isoi F x) 

isoi (F© r G) (inj 2 x) = cong inj 2 (isoi G x) 

isoi (F© r G) (x , y) = cong 2 _,_ (isoi Fx) (isoi G y) 

This proof is trivial, and so is its counterpart for the other direction (from r C (to r C x) = x). We use the 
Agda terms refl, sym, and cong for expressing reflexivity, symmetry, and congruence of equivalences, 
respectively. In particular cong is used very often as it allows us to set the focus of the proof deeper 
inside an expression. 

When considering fixed points the proofs become more involved, since recursion has to be taken into 
account. However, using the equational reasoning module from the standard library (see the work of 
Shin-Cheng Mu et al. |20| for a detailed account on this style of proofs in Agda) we can keep the proofs 
readable: 

open =-Reasoning 

isojUi : (C : Code r ) (x : ju r C) -)■ toju r C (from/^ r C x) = x 
isojUi C (x) r = cong (_} r $ 

begin 

to r C (map p ( r -f[~p C) id (to/x r C) (from r C (map r C (from/i r C) x))) 

=( mapCommute? C _ ) 

map r C (tOjU r C) (to r C (from r C (map r C (fromjU r C) x))) 
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=( cong (map r C (tOjU r C)) (isoi C _) ) 

map r C (to/x r C) (map r C (fromjU r C) x) 
=( map° C ) 

map r C (to/x r C o from/x r C) x 
=( map^ C (isOjUi C) x ) 

map r C id x 
=( map'? C ) 

x I 

To ease the reading of the equational style proofs, we highlight the term(s) that we are focusing on at 
each step. In this proof we start with an argument relating the maps of Regular and PolyP: 

mapCommutef : {Ri R 2 : Set} {f : Ri ->• R 2 } (C : Code r ) (x : [ r -frp C] p _L Ri) 
-)• to r C (map p (rfrp C) id f x) = map r C f (to r C x) 

In words, this theorem states that the following two operations over a Regular term x that has been lifted 
into PolyP are equivalent: 

• To map a function f over the recursive positions of x in PolyP with map p and then convert to 
Regular; 

• To first convert x back into Regular, and then map the function f with map r . 

After this step, we either proceed by a recursive argument (referring to isoi or iso/^i) or by a lemma. For 
conciseness, we show only the types of the lemmas: 

map° : {ABC: Set} {f : B -»• C} {g : A -»• B} (D : Code r ) {x : [D] r A} 
— > map r D f (map r Dgx) = map r D (f o g) x 

map^ : (C : Code r ) { A B : Set} {f g : A -»• B} 

— > (V x — > f x = g x) — > (V x — > map r C f x = map r C g x) 

mapi. d : V {A} (C : Code r ) {x : [C] r A} -»• map r C id x = x 

These lemmas are standard properties of map r , namely the functor laws and the fact that map r preserves 
extensional equality (a form of congruence on map r ). All of them are easily proved by induction on the 
codes. 

Put together, from/x,., to/x r , iso/Xi, and iso/^2 (the dual of iso/Xi) form an isomorphism that shows how 
to embed Regular codes into PolyP codes. 

3.2 PolyP to Indexed 

We proceed to the conversion between PolyP and Indexed codes. As we mentioned before, particular 
care has to be taken with composition; the remaining codes are trivially converted, so we only show 
composition: 



pfT : Code p ->• Code, (T i±) T) T 

p f (F@ p G) = (Fix, ( p f F))@i( p ft'G) 
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We cannot simply take the Indexed composition of the two converted codes because their types do not 
allow composition. A PolyP code results in an open Indexed code with one parameter and one recursive 
position, therefore of type Codej (T l±) T) T. Taking the fixed point of such a code gives a code of type 
Codej T T, so we can mimic PolyP's interpretation of composition in our conversion, using the FiXj of 
Indexed. In fact, converting PolyP to Indexed helps us understand the interpretation of composition 
in PolyP, because the types now show us that there's no way to define a composition other than by 
combining it with the fixed-point operator. 

Converting composed values from PolyP is then a recursive task, due to the presence of fixed points. 
We first convert the outer functor F, and then map the conversion onto the arguments, recalling that on 
the left we have parameter codes G, while on the right we have recursive occurrences of the original 
composition: 

fromp :{AR: Set} (C : Code p ) [C]| p A R -> l P f C], ((const A) |j (const R)) tt 

frorrip (F@ p G) (x) p = (mapi ( p fr F) ((A_ ^from p G) ||; (A _ -> from p (F@ p G))) tt (from p Fx))i 

We also show the conversion in the opposite direction, which is entirely symmetrical: 

to p :{AR: Set} (C : Code p ) -> l P f C], ((const A) |j (const R)) tt -> [C]| p AR 

to p (F ®p G) (x) i = (to p F (mapi ( p f< F) ((A _ -> to p G) || , (A _ -> to p (F ® p G))) tt x)) p 

The conversion of PolyP fixed points is very similar to the conversion of composition. The main 
difference lies in the functions that we map to the arguments and recursive positions: 

from/ip : {A : Set} (C : Code p ) — ^ /i p C A — >■ [FiXj ( p f C% (const A) tt 
from/^ p C (x) p = (mapi (pf C) ((const id) ||j (A _ ->■ from/i p C)) tt (from p C x))j 

We omit the conversion in the opposite direction for brevity, and also the isomorphism proof. Both 
the case for composition and for PolyP fixed points require lengthy proofs using properties of mapj, 
which are presented in more detail in the following sectionF] 



3.3 Indexed to InstantGenerics 

As a final example we show how to convert from a fixed-point view to the coinductive representation of 
InstantGenerics. Since all fixed-point views embed into Indexed, we need to define only the embed- 
ding of Indexed into InstantGenerics. Since the two universes are less similar, the code transformation 
requires more care: 



it 9 


:{IO: 


Set} - 


>Codei I — )• (I — )• 


Set) ->■ (0 -> Codei 


i1t i9 


Ui 


ro = 






i1r ia 


Oil) 


r o = 


Kig (ri) 




if 9 


OiO 


r o = 


Kig (o = i) 






(F®i G 


)ro = 


(\P Fro)e ig (ita 


G ro) 


if 9 


(F<8>i G 


)ro = 


(if" Fro)® ig (ifr i a 


G ro) 


■ t p 


(F®|G 


)ro = 


R ig (fl if" F (A i -> 


Iit 9 Gri] ig )o) 



3 This proof and other omitted details can be found in the code available at the first author's webpage ( http : //dreixel . 
net). 
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<p (FiXi F) r o = R ig (ft -,p F (r | , (A i hp (Fix, F) r i] ig )) o) 



Unit, sum, and product exist in both universes, so their conversion is trivial. Recursive invocations with I; 
are replaced by simple constants; we lose the ability to abstract over recursive positions, which is in line 
with the behavior of InstantGenerics. Tagging is also converted to a constant, trivially inhabited if we 
are in the expected output index o, and empty otherwise. Note that since an Indexed code can define 
multiple types, but an InstantGenerics code can only represent one type, \p effectively produces 
multiple InstantGenerics codes, one for each output index of the original Indexed family. 

A composition F © j G is encoded through recursion; the resulting code is the conversion of F, whose 
parameters are Indexed G functors. We convert these functors to sets using \p to get InstantGenerics 
codes, which we then interpret with [_l ig . 

A fixed-point FiXj F is naturally encoded through recursion, in a similar way to composition. The 
recursive positions of the fixed point are either: parameters on the left, converted with r as before; or 
recursive occurrences on the right, handled by recursively converting the codes with \p and interpreting. 

Note that both for composition and fixed points we instantiate the function which interprets indices 
(the r argument) with an InstantGenerics interpretation. However, r has type I — > Set, whereas [_J ig 
has return type Seti. If we were to raise Indexed to Seti, the interpretation function would then have 
type I — > Seti, but then we could no longer use it in the lj case. For now we rely on the Agda flag 



--type-in-type, and leave a formally correct solution for future work (see Section 4). 
Having the code conversion in place, we can proceed to convert values: 



from : {I O : Set} {r : I 

from Ui ott = 

from (lj i) ox = 

from (!j i) ox = 

from (F©j G) o (inj n x) = 

from (F©jG) o (inj 2 x) = 

from (F<g>jG) o (x , y) = 
from(F®iG)ox 

from (FiXj F) o (x)j = 



^Set}(C : Code; I O) (o : O) [C], ro hp C ro] ig 

"ig 

kjg X 
kjg X 

inj-ug (from Fox) 

inj 2 i g (from G o x) 

(from F o x), ig (from G o y) 

reQg (from F o (mapi F (from G) o x)) 

recjg (from F o (mapi F ((A _ — > id) || , (from (FiXj F)))ox)) 



The cases for composition and fixed point are more challenging because we have to map the conversion 
function inside the argument positions; we do this using the mapj function. As usual, the inverse function 
to is entirely symmetrical, so we omit it. 

It remains to show that the conversion functions form an isomorphism. We show the only two 
interesting cases: composition and fixed points. Following previous work fl6| , we lift composition, 
equality, and identity to natural transformations in Indexed (respectively _o =!i _, _=j_, and icU^). We 
use equational reasoning for the proofs, and highlight the part of the term that we focus on in each line 
of the proof: 

isoi : {I O : Set} (C : Codei I O) (r : I ^ Set) -)• (to {r = r} Co^from C)=iid^j 
isoi (F@j G) r o x = 
begin 

map, F (to G) o (to F o (from F o (mapi F (from G) o x))) 
=( cong (mapi F (to G) o) (isoi F _ o _) ) 
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map, F (to G) o (map, F (from G) o x) 
=( sym (map° F (to G) (from G) o x) ) 
map, F (to G o-^ from G) o x 

=( map- F (isoi G r) o x ) 
map, F id^i o x 

=( map'i d Fox) 

x I 

The proof for composition is relatively simple, relying on applying the proof recursively, fusing the two 
maps, reasoning by recursion on the resulting map, which results in an identity map. The proof for fixed 
points is slightly more involved: 

isoi (FiXj F) ro (x)j = cong $ 
begin 

map, F (id=t| ||; (to (FiXj F))) o (to F o (from F o (map, F (id^j \\- t (from (FiXj F))) o x))) 
=(cong (map, F (id^, H, (to (Fixi F))) o) (isoi F_o_) ) 

map, F (id^i ||; (to (FiXj F))) o (mapj F (id^j ||| (from (FiXj F))) o x) 
=( sym (map° F (id^, ||, (to (Fix, F))) (id^, ||, (from (Fix, F))) o x) ) 

rnapi F ( id =M |||(to(FiX| F)) ) o^, ( id^ H, (from (FiXj F)) ) o x 
=( sym (map- F \\o- t o x) ) 

rnapi F ((id^o^id^i) || , ( to (Fix; Fjo^from (Fix; F) )) o x 

=( mapy F (Hcongj (A __->• refl) (isoi (Fix s F) r)) ox ) 
rnapi F (id^Jiid^i) o x 

=( map- F (Hidi (A__^refl) (A __ -> refl)) ox) 

rnapi F id^j o x 
=( mapf Fox) 

x I 

We start in the same way as with composition, but once we have fused the maps we have to deal with 
the fact that we are mapping distinct functions to the left (arguments) and right (recursive positions). We 
proceed with a lemma on disjunctive maps that states that a composition of disjunctions is the disjunction 
of the compositions (||Oj). Then we are left with a composition of identities on the left, which we solve 
with reflexivity, and a composition of to and from on the right, which we solve by induction. Finally, we 
show that a disjunction of identities is the identity (with the ||id { lemma), and that the identity map is the 
identity. The lemmas regarding mapj that we require are the following: 

map- : {I O : Set} {rst : Indexed 1} 

(C : Codei I O) (f : s=M) (9 : r =^i s ) -> ma Pi c ( f °^i9) =i ( ma Pi c f ma Pi c 9) 
map/ : {I O : Set} {rs : Indexed 1} {f g : r=^s} 

(C : Codei I O) ->• f =i g ->• rnapi C f =i rnapi C g 

mapf : {I O : Set} {r : Indexed 1} (C : Codei I O) ->• map, {r = r} C id^j =i id^j 
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Like the lemmas for map r of Section 3.1 these are trivially proven by induction on the structure of 
Indexed codes. 



4 Discussion and future work 

We have compared different generic programming universes by showing the inclusion relation between 
them. This is useful to determine that one approach can encode at least as many datatypes as another 
approach, and also allows for lifting representations between compatible approaches. This also means 
that generic functions from approach B are all applicable in approach A, if A embeds into B, because we 
can bring generic values from approach A into B and apply the function there. However, we cannot make 
statements about the variety of generic functions that can be encoded in each approach. The generic 
map function, for instance, cannot be defined in InstantGenerics, while it is standard in Indexed. 
One possible direction for future research is to devise a formal framework for evaluating what generic 
functions are possible in each universe, adding another dimension to the comparison of the approaches. 
Notably absent from our comparison are libraries with a generic view not based on a sum of prod- 



ucts. In particular, the spine view 1 12\ is radically different from the approaches we model; yet, it is 
the basis for a number of popular generic programming libraries. It would be interesting to see how 
these approaches relate to those we have seen, but, at the moment, converting between entirely different 
universes remains a challenge. 

An issue that remains with our modelling is to properly address termination. While our conversion 
functions can be used operationally to enable portability across different approaches, to serve as a formal 
proof they have to be terminating. Since Agda's algorithm for checking termination is highly syntax- 
driven, attempts to convince Agda of termination are likely to clutter the model, making it less easy to 
understand. We have thus decided to postpone such efforts for future work, perhaps relying on sized 
types for guaranteeing termination of our proofs {TJ. 



A related issue that remains to be addressed is our use of — type-in-type in |Section 3.3[ for the 



code conversion function ifr's. It was not immediately clear to us how to solve this issue even with the 
recently added support for universe polymorphism in Agda. 

Nonetheless, we believe that our work is an important first step towards a formal categorisation 
of generic programming libraries. Future approaches can now rely on our formalisation to describe 
precisely the new aspects they introduce, and how the new approach relates to existing ones. In this way 
we can hope for a future of modular generic programming, where a specific library might be constructed 
using components from different approaches, tailored to a particular need while still reusing code from 
existing libraries. 
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